課程資訊
課程名稱
資訊檢索與文字探勘導論
Introduction to Information Retrieval and Text Mining 
開課學期
104-1 
授課對象
學程  知識管理學程  
授課教師
陳建錦 
課號
IM5030 
課程識別碼
725 U3410 
班次
 
學分
全/半年
半年 
必/選修
選修 
上課時間
星期二2,3,4(9:10~12:10) 
上課地點
管二305 
備註
本課程中文授課,使用英文教科書。知識管理學程系統領域選修課程。
限學士班三年級以上
總人數上限:25人 
Ceiba 課程網頁
http://ceiba.ntu.edu.tw/1041IRTM 
課程簡介影片
 
核心能力關聯
核心能力與課程規劃關聯圖
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

This course will cover the concepts and algorithms of information retrieval and text mining. Theoretical topics, including term extraction, term weighting, vector space model, binary independence model, language model, IR system evaluations, naive bayes classification, Rocchio classification, kNN, k-means, HAC, PageRank, and HITS, will be presented in this course. Meanwhile, programming assignments and term projects will be given to help students understand the development of an IR system. 

課程目標
The course is aimed at graduate students or senior undergraduate students who are interested in information retrieval and text mining. The first part of the course will cover the basics of information retrieval. Then, research topics, such as text classification and clustering, will be discussed to provide a comprehensive study on information retrieval and text mining. 
課程要求
Programming language, data structure, and probability. 
預期每週課後學習時數
 
Office Hours
 
指定閱讀
 
參考書目
Christopher D. Manning and Hinrich Schutze, Foundations of Statistical Natural
language Processing, The MIT Press, 1999.
William B. Frakes and Ricardo Baeza-Yates, Information Retrieval — Data
Structures and Algorithms, Prentice Hall, 1992.
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval,
Addison Wesley, 1999.
 
評量方式
(僅供參考)
   
課程進度
週次
日期
單元主題
Week 1
9/15  The Term Vocabulary 
Week 2
9/22  The Term Vocabulary
PAT Tree and Chinese Keyword Extraction
*** Programming Assignment 1 
Week 3
9/29  suspension (due to typhoon) 
Week 4
10/06  Scoring, Term Weighting and the Vector Space Model 
Week 5
10/13  Evaluation in Information Retrieval 
Week 6
10/20  Probabilistic Information Retrieval 
Week 7
10/27  Probabilistic Information Retrieval
Language Models for Information Retrieval 
Week 8
11/03  Language Models for Information Retrieval 
Week 9
11/10  Midterm 
Week 10
11/17  Link Analysis 
Week 11
11/24  Text Classification and Naïve Bayes 
Week 12
12/01  Text Classification and Naïve Bayes
*** Programming Assignment 3 
Week 13
12/08  Vector Space Classification 
Week 14
12/15  Hierarchical Clustering 
Week 15
12/22  Hierarchical Clustering
Flat Clustering 
Week 16
12/29  Conference leave 
Week 17
1/05  Flat Clustering 
Week 18
1/12  Final 
Week 19
1/19  the IRTM workshop